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Abstract. We develop a novel and powerful technique for communication lower bounds, the 
pattern matrix method. Specifically, fix an arbitrary function /: {0, 1}" — ► {0, 1} and let Af be the 
matrix whose columns are each an application of / to some subset of the variables xi, X2, ■ ■ ■ , a;4„. 
We prove that Af has bounded-error communication complexity where d is the approximate 

degree of /. This result remains valid in the quantum model, regardless of prior entanglement. In 
particular, it gives a new and simple proof of Razborov's breakthrough quantum lower bounds for 
disjointness and other symmetric predicates. We further characterize the discrepancy, approximate 
rank, and approximate trace norm of Af in terms of well-studied analytic properties of /, broadly 
generalizing several recent results on small-bias communication and agnostic learning. The method 
of this paper has recently enabled important progress in multiparty communication complexity. 
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1. Introduction. A central model in communication complexity is the bounded- 
error model. Let /: X xY ^ {^Ij +1} be a given function, where X and Y are finite 
sets. Ahce receives an input x ^ X, Bob receives y G Y, and their objective is to 
compute f{x, y) with minimal communication. To this end, Ahce and Bob share an 
unlimited supply of random bits. Their protocol is said to compute f if on every input 
{x,y), the output is correct with probability at least 1 — e. The canonical setting 
is e = 1/3, but any other parameter e G (0,1/2) can be considered. The cost of 
a protocol is the worst-case number of bits exchanged on any input. Depending on 
the physical nature of the communication channel, one studies the classical model, 
in which the messages are classical bits and 1, and the more powerful quantum 
model, in which the messages are quantum bits and arbitrary prior entanglement is 
allowed. The communication complexity in these models is denoted Reif) and Q*^{f), 
respectively. 

Bounded-error protocols have been the focus of much research in communication 
complexity since the introduction of the area by Yao [HS] three decades ago. A 
variety of techniques have been developed for proving lower bounds on classical 
communication, e.g., [2Z1 ESI HH "^4 , 13, iUI^niEn]. There has been consistent progress 
on quantum communication as well ^661 |3l [121 ED ESI [56l |42], although quantum 
protocols remain much less understood than their classical counterparts. 

The main contribution of this paper is a novel and powerful method for lower 
bounds on classical and quantum communication complexity, the pattern matrix 
method. The method converts analytic properties of Boolean functions into lower 
bounds for the corresponding communication problems. The analytic properties in 
question pertain to the approximation and sign-representation of a given Boolean 
function by real polynomials of low degree, which are among the oldest and most 
studied objects in theoretical computer science. In other words, the pattern matrix 
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method takes the wealth of results available on the representations of Boolean 
functions by real polynomials and puts them at the disposal of communication 
complexity. 

We consider two ways of representing Boolean functions by real polynomials. Let 
/: {0, 1}" ~^ {— 1,+1} be a given Boolean function. The e- approximate degree of /, 
denoted degg(/), is the least degree of a real polynomial p such that \f{x) — p{x)\ < e 
for all X £ {0, 1}". There is an extensive literature on the e-approximate degree of 
Boolean functions j48j |50l [26l [HI [H [21 [58l |64] , for the canonical setting e = 1/3 and 
various other settings. Apart from uniform approximation, the other representation 
scheme of interest to us is sign-representation. Specifically, the degree-d threshold 
weight W{f, d) of / is the minimum X]|s|<rf l-^sl over all integers Xs such that 

fix) = sgn XsXsix) , 

\SC{l,...,n}, |SKd / 

where xsi^) — (—1)^'^^^'- If no such integers Xs exist, we write W{f,d) = oo. The 
threshold weight of Boolean functions has been heavily studied, both when W{f, d) 
is infinite gSl [1 [371 [Ml [M1I311 [IE and when it is finite 051 iSllTl [Ml [MUM [51 [53] . 
The notions of uniform approximation and sign-representation are closely related, as 
we discuss in Section [2] Roughly speaking, the study of threshold weight corresponds 
to the study of the e-approximate degree for e — 1 — o(l). 

Having defined uniform approximation and sign-representation for Boolean func- 
tions, we now describe how we use them to prove communication lower bounds. 
The central concept in our work is what we call a pattern matrix. Consider the 
communication problem of computing 

fi^W), 

where /: {0,1}' {—1,-1-1} is a fixed Boolean function; the string x £ {0,1}" is 
Alice's input (n is a multiple of t); and the set V C {1, 2, . . . , n} with \V\ — t is Bob's 
input. In words, this communication problem corresponds to a situation when the 
function / depends on only t of the inputs xi, . . . ,Xn. Alice knows the values of all 
the inputs xi, . . . ,Xn but does not know which t of them are relevant. Bob, on the 
other hand, knows which t inputs are relevant but does not know their values. This 
communication game was introduced and studied in an earlier work by the author |60j . 
in the context of small-bias communication. For the purposes of the introduction, one 
can think of the (n,t, f)-pattern matrix as the matrix [/(xly)]^;,^/, where V ranges 
over the {n/tY sets that have exactly one element from each block of the following 
partition: 

^--'M'-^ i 

We defer the precise definition to Section [4] Observe that restricting V to be of special 
form only makes our results stronger. 

1.1. Our results. Our main result is a lower bound on the communication 
complexity of a pattern matrix in terms of the e-approximate degree of the base 
function /. The lower bound holds for both classical and quantum protocols, regardless 
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of prior entanglement. 

Theorem 1.1 (communication complexity) . Let F be the {n,t, f)-pattern matrix, 
where f : {0, 1}* {^Ij +1} is given. Then for every e G [0, 1) and every 5 < e/2, 

QH^)=.Jdeg.(/)log,(^)-llog, (-^). 

In particular, 

(1.1) Qt/7(F)>ldegi/3(/)log2(^)-3. 



Note that Theorem 1 1 . 1 1 yields lower bounds for communication complexity with 
error probability S for any S G (0,1/2). In particular, apart from bounded-error 



communication ( 1.1 1, we obtain lower bounds for communication with small bias, i.e., 
error probability ^ — o(l). In Section 6j we derive another lower bound for small-bias 
communication, in terms of thresholcTweight W(f, d). 



As R. de Wolf pointed out to us [17], the lower bound (1.1 1 for bounded-error 



communication is within a polynomial of optimal. More precisely, F has a classical 
deterministic protocol with cost 0(deg;^/3(/)^ log(n/t)), by the results of Beals et 
al. [S]. See Proposition 5.1 for details. In particular. Theorem |1.1| exhibits a large 
new class of communication problems F whose quantum communication complexity 
is polynomially related to their classical complexity, even if prior entanglement 
is allowed. Before our work, the largest class of problems with polynomially re- 
lated quantum and classical bounded-error complexities was the class of symmetric 
functions (see Theorem 1.3 below), which is broadly subsumed by Theorem |1.1[ 
Exhibiting a polynomial relationship between the quantum and classical bounded- 
error complexities for all functions F: X x Y {— 1,+1} is a longstanding open 
problem. 

Pattern matrices are of interest because they occur as submatrices in many natural 
communication problems. For example. Theorem |1.1| can be interpreted in terms of 
function composition. Setting n — At for concreteness, we obtain: 



Corollary 
{0,1}4*^{-1,H 
Then 



1.2. Let f: {0,1}* 
1} by F{x,y)^f{.. 



{-!,+!} be given. Define F : {0,1}** x 
, (a;-i4J/j,i V Xi^2yi,2 V a;i,3yi,3 V x^^yiA), ■ ■ ■)■ 



}1,,{F) > degi/3(/)-3. 



As another illustration of Theorem |1.1| we revisit the quantum communication 
complexity of symmetric functions. In this setting Alice has a string x G {0,1}", 
Bob has a string y E {0, 1}", and their objective is to compute Di^Xiyi) for some 
predicate D : {0, 1, . . . , n} — > {—1, -1-1} fixed in advance. This framework encompasses 
several familiar functions, such as disjointness (determining if x and y intersect) 
and INNER product modulo 2 (determining if x and y intersect in an odd number 
of positions). In a celebrated result, Razborov [56 established optimal lower bounds 
on the quantum communication complexity of every function of the above form: 

Theorem 1.3 (Razborov) . Let D : {0, 1, . . . , n} ^ {— 1, -1-1} be a given predicate. 
Put f{x,y) ^ D{Y,Xiyi). Then 



Qu,{f)^n{^n£oiD) + £,iD)), 
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where £o{D) e {0, 1, . . . , [n/2j } and ti{D) e {0, 1, . . . , [n/2] } are the smallest 
integers such that D is constant in the range [£q{D), n — £i(D)]. 

Using Theorem |1.1| we give a new and simple proof of Razborov's result. No 
alternate proof was available prior to this work, despite the fact that this problem 
has drawn the attention of various researchers [3l [lH |3TJ |29l |24l |42]. Moreover, the 
next-best lower bounds for general predicates were nowhere close to Theorem |1.3| 
To illustrate, consider the disjointness predicate D, given by D{t) ^ 1 4^ t = 0. 



Theorem 1.3 shows that it has communication complexity ri(y^), while the next- 
best lower bound |31 [H] was only ri(logn). 

Approximate rank and trace norm. We now describe some matrix-analytic conse- 
quences of our work. The e-approximate rank of a matrix F e {—1, +1}™^", denoted 
rkg F, is the least rank of a real matrix A such that |Fy — Ay | ^ e for all i,j. This 
natural analytic quantity arose in the study of quantum communication |661 1121 156] 
and has since found applications to learning theory. In particular, Klivans and 
Sherstov |^ proved that concept classes (i.e., sign matrices) with high approximate 
rank are beyond the scope of all known techniques for efficient learning, in Kearns' 
well-studied agnostic model |28j . Exponential lower bounds were derived in |35j on 
the approximate rank of disjunctions, majority functions, and decision lists, with the 
corresponding implications for agnostic learning. We broadly generalize these results 
on approximate rank to any functions with high approximate degree or high threshold 
weight: 

Theorem 1.4 (approximate rank). Let F be the (n,t, f) -pattern matrix, where 
f: {0, 1}* — !■ { — 1, +1} is given. Then for every e £ [0, 1) and every 5 G [0, e], 



rk^F > 



- (5\ " /n\dcg,(/) 



1 + 5) \t. 

In addition, for every 7 £ (0, 1) and every integer d ^ 1, 



, p. / 7 V . jfny Wif,d-l) \ 

We derive analogous results for the approximate trace norm, another matrix- 
analytic notion that has been studied in complexity theory. Theorem |1.4| is close to 
optimal for a broad range of parameters. See Section [8] for details. 

Discrepancy. The discrepancy of a function F: X xY ^ {—1,-1-1}, denoted 
disc(-F), is a combinatorial measure of the complexity of F (small discrepancy 
corresponds to high complexity) . This complexity measure plays a central role in the 
study of communication. In particular, it fully characterizes membership in PP^'^, the 
class of communication problems with efficient small-bias protocols (30j . Discrepancy 
is also known |43j be to equivalent to margin complexity, a key notion in learning 
theory. Finally, discrepancy is of interest in circuit complexity fSTJ [22l |47j . We are 
able to characterize the discrepancy of every pattern matrix in terms of threshold 
weight: 

Theorem 1.5 (discrepancy). Let F be the (n,t, f) -pattern matrix, for a given 
function f: {0, 1}* ^ {-1, -1-1}. Then 



disc(_F') ^ min max 



=1,...,* \\W{f,d-l) 
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As we show in Section |7] Theorem |1.5| is close to optimal. It is a substantial 
improvement on the author's earlie r wo rk |60j . 



As an application of Theorem 1.5 we revisit the discrepancy of AC°, the class 
of polynomial-size constant-depth circuits with AND, OR, NOT gates. In an earlier 
work [60] , we obtained the first exponentially small upper bound on the discrepancy of 
a function in AC". We used this result in [60j to prove that depth-2 majority circuits for 
AC° require exponential size, solving an open problem due to Krause and Pudlak [37]. 



Using Theorem 1.5 we are able to considerably sharpen the bound in [50]. Specifically, 
we prove: 

Theorem 1.6. Let f{x,y) = V"i V Then 

disc(/) — exp{— ri(m)}. 



We defer the new circuit implications and other discussion to Sections [7] and [TOl 
Independently of the work in [60 , Buhrman et al. [llj exhibited another function in 
AC'^ with exponentially small discrepancy: 

Theorem (Buhrman et al.). Let f : {0,1}" x {0,1}" {-1,+1} be given by 
f{x,y) = sgn (1 -I- Er=i(-2)^x,y,) . Then 

disc(/) = exp{-n{n^/^)}. 



Using Theorem 1 1.5[ we give a new and simple proof of this result. 

1.2. Our techniques. The setting in which to view our work is the general- 
ized discrepancy method, a straightforward but very useful principle introduced by 
Klauck |29j and reformulated in its current form by Razborov [56 . Let F(x,y) be a 
Boolean function whose bounded-error communication complexity is of interest. The 
generalized discrepancy method asks for a Boolean function H(x, y) and a distribution 
fj, on (a;,?/)-pairs such that: 

(1) the functions F and H have correlation ri(l) under fi; and 

(2) all low-cost protocols have negligible advantage in computing H under ^. 

If such H and /i indeed exist, it follows that no low-cost protocol can compute F to 
high accuracy (otherwise it would be a good predictor for the hard function H as 
well). This method applies broadly to many models of communication, as we discuss 
in Section [2^ It generalizes Yao's original discrepancy method |1D], in which H ^ F. 
The advantage of the generalized version is that it makes it possible, in theory, to prove 
lower bounds for functions such as disjointness, to which the traditional method 
does not apply. 

The hard part, of course, is finding H and /i with the desired properties. Except 
in rather restricted cases [291 Thm. 4], it was not known how to do it. As a result, 
the generalized discrepancy method was of limited practical use prior to this paper. 
Here we overcome this difficulty, obtaining H and for a broad range of problems, 
namely, the communication problems of computing f{x\v)- 

Pattern matrices are a crucial first ingredient of our solution. We derive an 
exact, closed- form expression for the singular values of a pattern matrix and their 
multiplicities. This spectral information reduces our search from H and /i to a much 
smaller and simpler object, namely, a function ■0: {0, 1}* M with certain properties. 
On the one hand, ^ must be well-correlated with the base function /. On the other 
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hand, ifj must be orthogonal to all low-degree polynomials. We establish the existence 
of such -0 by passing to the linear programming dual of the approximate degree of /. 
Although the approximate degree and its dual are classical notions, we are not aware 
of any previous use of this duality to prove communication lower bounds. 

For the results that feature threshold weight, we combine the above program 
with the dual characterization of threshold weight. To derive the remaining results 
on approximate rank, approximate trace norm, and discrepancy, we apply our main 
technique along with several additional matrix-analytic and combinatorial arguments. 

1.3. Recent work on multiparty complexity. The method of this paper has 
recently enabled important progress in multiparty communication complexity by a 
number of researchers. Lee and Shraibman [Al^ and Chattopadhyay and Ada |14j 
observed that our method adapts in a straightforward way to the multiparty model, 
thereby obtaining much improved lower bounds on the communication complexity of 
DISJOINTNESS for up to log log 71 players. David and Pitassi [15j ingeniously combined 
this line of work with the probabilistic method, establishing a separation of the 
communication classes NP^^ and BPR^."^ for up to fc = (1 — e)logn players. Their 
construction was derandomized in a follow-up paper by David, Pitassi, and Viola |16j . 
resulting in an explicit separation. See the survey article [61 for a unified guide to 
these results, complete with all the key proofs. A very recent development is due 
to Beame and Huynh-Ngoc [B], who continue this line of research with improved 
multiparty lower bounds for AC*^ functions. 

1.4. Organization. We start with a thorough review of technical preliminaries 

in Section [2j The two sections that follow are concerned with the two principal 
ingredients of our technique, the pattern matrices and the dual characterization of 
the approximate degree and threshold weight. Section [5] integrates them into the 
generalized discrepancy method and establishes our main result. Theorem In 
Section |6j we prove an additional version of our main result using threshold weight. 
We characterize the discrepancy of pattern matrices in Section |7] Approximate rank 
and approximate trace norm are studied next, in Section [8] We illustrate our main 
result in Section [9] by giving a new proof of Razborov's quantum lower bounds. As 
another illustration, we study the discrepancy of AC" in Section [lo] We conclude with 
some remarks on the well-known log-rank conjecture in Section [TT] and a discussion 
of related work in Section [12] 

2. Preliminaries. We view Boolean functions as mappings X { — 1, +1} for a 
finite set X, where —1 and 1 correspond to "true" and "false," respectively. Typically, 
the domain will he X = {0, 1}" or X = {0, 1}" x {0, 1}". A predicate is a mapping 
D : {0, 1, . . . , n} { — 1, +1}- The notation [n] stands for the set {1,2,..., n}. For a 

set S C [n], its characteristic vector I5 € {0, 1}" is defined by 



1 otherwise. 

For h e {0, 1}, we put ^6=1 — 6. For x G {0, 1}", we define = a:i + • • • + Xn- 
For x,y ^ {0, 1}", the notation x /\ y E {0, 1}" refers as usual to the component-wise 
conjunction of x and y. Analogously, the string xW y stands for the component-wise 
disjunction of x and y. In particular, \x A y\ is the number of positions in which 
the strings x and y both have a 1. Throughout this manuscript, "log" refers to 
the logarithm to base 2. As usual, we denote the base of the natural logarithm by 
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e — 2.718 .... For any mapping cj): X ^ M., where X is a finite set, we adopt the 
standard notation ||0||oo = max^jgx 10(2^) I- We adopt the standard definition of the 
sign function: 

{-1 ifi<0, 
if i = 0, 
1 if t > 0. 

Finally, we recall the Fourier transform over Zj. Consider the vector space of 
functions {0, 1}" — *■ M, equipped with the inner product 

(/,g)=2-" J2 /(^).9(^)- 

a:e{0,l}" 

For S C [ni define xs : {0,1}" ^ {-1,+1} by xs{x) = {-l)^.es-^. Then {xs}sc[n] 
is an orthonormal basis for the inner product space in question. As a result, every 
function /: {0, 1}" M. has a unique representation of the form 

SC[n] 

where f{S) = (/, xs)- The reals f{S) are called the Fourier coefficients of f. The 
degree of /, denoted deg(/), is the quantity max{|S'| : f{S) ^ 0}. The orthonormality 
of {xs} immediately yields Parseval's identity: 

(2.1) fisr^{fj)^nf{xn 

SC [n] 

The following fact is immediate from the definition of f{S): 
Proposition 2.1. Let f: {0, 1}" -^R be given. Then 

max 1/(5) K 2- J2 1/(^)1- 

' x€{0,l}" 



A Boolean function /: {0, 1}" — > { — 1,+1} is called symmetric if f{x) is uniquely 
determined by '^Xi. Equivalently, a Boolean function / is symmetric if and only if 

f{xi,X2,...,Xn) = /(a:^CT(i),2^<T(2), ■ ■ • ,a;a(n)) 

for all inputs x G {0, 1}" and all permutations cr: [n] [n]. Note that there is a one- 
to-one correspondence between predicates and symmetric Boolean functions. Namely, 
one associates a predicate D with the symmetric function f{x) = D(Y^ Xi). 

2.1. Matrix analysis. We draw freely on basic notions from matrix analysis. 
In particular, we assume familiarity with the singular value decomposition; positive 
semidefinite matrices; matrix similarity; matrix trace and its properties; the Kronecker 
product and its spectral properties; the relation between singular values and eigenval- 
ues; and eigenvalue computation for matrices of simple form. An excellent reference on 
the subject is |23j . The review below is limited to notation and the more substantial 
results. 
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The symbol refers to the family of all m x rt matrices with real entries. We 

specify matrices by their generic entry, e.g., A — [F{i^ In most matrices that 

arise in this work, the exact ordering of the columns (and rows) is irrelevant. In such 
cases we describe a matrix by the notation [F(i, where / and J are some 

index sets. We denote the rank of A G M™^" by rk A. We also write 

||A|U=max \A,,\, P||i = ^|A,,|. 

We denote the singular values of A by ai{A) ^ cr2(A) ^ ••• ^ <^min{m,n}{^) ^ 0- 
Recall that the spectral norm, trace norm, and Frobenius norm of A are given by 

\\A\\ = max \\Ax\\ = cri{A), 
\\A\\^ =J2aM), 

For a square matrix A G M"^", its trace is given by tr A = ^ An. 

Recall that every matrix A G ]^™xn ]-^as a singular value decomposition A = 
[/SF^, where U and V are orthogonal matrices and S is diagonal with entries 
ai(A),a2(A),...,(T,„i„{„,„}(A). For A,B e M"><", we write {A,B) = E^^j^^j = 
tr(Ai3^). A useful consequence of the singular value decomposition is: 

(2.2) ^ ||A|| ||B||s (A,BgM"^"). 



Following (56j . we define the e-approximate trace norm of a matrix F G M™^" by 
\\F\\^^,=mm{\\A\\^:\\F-A\\oo^e}. 



The next proposition is a trivial consequence of (2.2 1. 



Proof. Fix any ^ ^0 and A such that \\F - A\\r^ ^ e. Then (A,^) 
by (|2^. On the other hand, (A,^) ^ (F, - ||A - i^||oo||^'||i > (i^, - e||*||i- 
Comparing these two estimates of {A, 'i') gives the sought lower bound on || Ajls. □ 

Following [12] , we define the e-approximate rank of a matrix _F G M"'^" by 

rke F = min{rk A : ||F - A||oo e}. 



The approximate rank and approximate trace norm are related by virtue of the 
singular value decomposition, as follows. 

Proposition 2.3. Let F e and e ^ &e given. Then 

rk ^>^^lk^ 
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Proof (adapted from [33). Fix A with \\F - A\\^ < e. Then 

\\Fh,. ^ \\Ah ^ \\A\kVrkA ^ (^m,\ + ef^ VrkA. □ 

We will also need a well-known bound on the trace norm of a matrix product, 
which we state with a proof for the reader's convenience. 

Proposition 2.4. For all real matrices A and B of compatible dimensions, 

\\AB\\^^\\A\\f \\B\\f. 



Proof. Write the singular value decomposition AB = [/SF^. Let ui,U2, . . . and 
vi,V2, ■ ■ ■ stand for the columns of U and V, respectively. By definition, || is the 

sum of the diagonal entries of E. We have: 

\\AB\\^ = Y,{U^ABV)u = Y.^u]A){Bv,) < J] \\A^u,\\ \\Bv,\\ 



^ VE ll^^^^ll VE ll^^'^ll' = II^^^IIf II^^IIf = Uh \\Bh. □ 



2.2. Approximation and sign-representation. For a function /: {0, 1}" 
M, we define 

E{f,d) =min||/-p||oo, 

V 

where the minimum is over real polynomials of degree up to d. The e-approximate 
degree of /, denoted deg^{f), is the least d with E{f,d) ^ e. In words, the e- 
approximate degree of / is the least degree of a polynomial that approximates / 
uniformly within e. 

For a Boolean function /: {0,1}" {—1,+!}, the e-approximate degree is of 
particular interest for e = 1/3. The choice of e = 1/3 is a convention and can be 
replaced by any other constant in (0,1), without affecting deg^(f) by more than a 
multiplicative constant. Another well-studied notion is the threshold degree deg±{f), 
defined for a Boolean function /: {0, 1}" { — 1,+1} as the least degree of a real 
polynomial p with f{x) = sgiip(x). In words, deg±(/) is the least degree of a 
polynomial that represents / in sign. 

So far we have considered representations of Boolean functions by real poly- 
nomials. Restricting the polynomials to have integer coefficients yields another 
heavily studied representation scheme. The main complexity measure here is the 
sum of the absolute values of the coefficients. Specifically, for a Boolean function 
/: {0,1}" {— 1,-|-1}, its degree-d threshold weight W{f,d) is defined to be the 
minimum X]|s|<rf l-^sl O"^^"^ integers As such that 

f{x) = sgn ^ \sXs{x) . 

\SC{l,...,n}, |SKd / 

If no such integers As can be found, we put W(/, d) = oo. It is straightforward to 
verify that the following three conditions are equivalent: W{f,d) — oo; E{f,d) = 1; 
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d < dcg±(/). In all expressions involving W{f,d), we adopt the standard convention 
that 1 /oo — and min{i, oo} — t for any real t. 

As one might expect, representations of Boolean functions by real and integer 
polynomials are closely related. In particular, we have the following relationship 
between E{f,d) and W{f,d). 

Theorem 2.5. Let /: {0, 1}" ^ {-1,+1} be given. Then for d = 0,1, ... ,n, 



with the convention that 1/0 = oo. 

Since Theorem |2.5| is not directly used in our derivations, we defer its proof to 
Appendix [A| Similar statements have been noted earlier by several authors |381 lllj . 
We close this section with Paturi's tight estimate [i50 of the approximate degree for 
each symmetric Boolean function. 

Theorem 2.6 (Paturi). Let /: {0,1}" {— 1,+1} be a given function such that 
f{x) = D(J2 Xi) for some predicate D : {0, 1, . . . , n} {—1, +1}. Then 



where io{D) € {0, 1, . . . , [n/2j } and £i{D) e {0, 1, . . . , [n/2] } are the smallest 
integers such that D is constant in the range [io{D),n — ii{D)]. 

2.3. Quantum communication. This section reviews the quantum model of 
communication complexity. We include this review mainly for completeness; our proofs 
rely solely on a basic matrix-analytic property of such protocols and on no other aspect 
of quantum communication. 

There are several equivalent ways to describe a quantum communication pro- 
tocol. Our description closely follows Razborov ^B]. Let £/ and ^ be complex 
finite-dimensional Hilbert spaces. Let be a Hilbert space of dimension 2, whose 
orthonormal basis we denote by |0), |1). Consider the tensor product £/ (^'^ 
which is itself a Hilbert space with an inner product inherited from and 
The state of a quantum system is a unit vector in and conversely any such 

unit vector corresponds to a distinct quantum state. The quantum system starts in a 
given state and traverses a sequence of states, each obtained from the previous one 
via a unitary transformation chosen according to the protocol. Formally, a quantum 
communication protocol is a finite sequence of unitary transformations 

Ui(g)Isg, Is^®U2, Uz®Isg, I.s^®Ui, U2k-l®Is, I.s^®U2k, 

where: and lag are the identity transformations in ^ and -SS, respectively; 
Ui,U^, . . . ,U2k-i ai'e unitary transformations m si/ ® and U2,U4, . . . ,U2k £^re 
unitary transformations in (g) The cost of the protocol is the length of this 
sequence, namely, 2k. On Alice's input x G X and Bob's input y & Y (where X,Y 
are given finite sets), the computation proceeds as follows. 

1. The quantum system starts out in an initial state lnitial(a;, y). 

2. Through successive applications of the above unitary transformations, the 
system reaches the state 




degi/3(/) = 6 [v^MT) + V^MD) , 



F\na\{x,y) = {I^ ^ U2k){U2k^i ^ Iss) ■ ■ ■ {I^ U2){Ui ^ lag) lnitial(a;, y). 
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3. Let V denote the projection of Final(a;, y) onto ^ (8)span(|l))(g)^. The output 
of the protocol is 1 with probabiUty {v,v), and with the complementary 
probabiHty 1 — {v,v). 
All that remains is to specify how the initial state lnitial(x, y) e (8) ^ <^ is 
constructed from x, y. It is here that the model with prior entanglement differs from 
the model without prior entanglement. In the model without prior entanglement, £/ 
and ^ have orthonormal bases {|x, w) : x G X, w £ W} and {\y,w) : y & Y, w £ W}, 
respectively, where W is a, finite set corresponding to the private workspace of each 
of the parties. The initial state is the pure state 



where e W is a certain fixed element. In the model with prior entanglement, the 

spaces and ^ have orthonormal bases {\x,w,e) : x G X, w E W, e E E} and 
{\y,w,e) : y G Y, w gW, e £ E}, respectively, where W is as before and E is a. finite 
set corresponding to the prior entanglement. The initial state is now the entangled 
state 



Apart from finite size, no assumptions are made about W or E. In particular, the 
model with prior entanglement allows for an unlimited supply of entangled qubits. 
This mirrors the unlimited supply of shared random bits in the classical public-coin 
randomized model. 

Let /: X X F— > {— 1,+1} be a given function. A quantum protocol P is said to 
compute / with error e if 



for all X, y, where the random variable P{x, y) G {0, 1} is the output of the protocol 
on input {x,y). Let Qdf) denote the least cost of a quantum protocol without prior 
entanglement that computes / with error e. Define Ql{f) analogously for protocols 
with prior entanglement. The precise choice of a constant < e < 1/2 affects Qe{f) 
and Q*{.f) by at most a constant factor, and thus the setting e = 1/3 entails no loss 
of generality. 

Let D : {0, 1, . . . , n} — !■ {— 1, +1} be a predicate. We associate with D the function 
/: {0,1}" X {0,1}" ^ {-!,+!} defined by f{x,y) = D{Y.x^yi). We let Q,(Z)) = 
Qe{f) and Q*^{D) = Q*{f). More generally, by computing D in the quantum model 
we mean computing the associated function /. We write -Re(/) for the least cost of a 
classical public-coin protocol for / that errs with probability at most e on any given 
input. Another classical model that figures in this paper is the deterministic model. We 
let D{f) denote the deterministic communication complexity of /. Throughout this 
paper, by the communication complexity of a Boolean matrix F = [Fij]i^j j^j we will 
mean the communication complexity of the associated function /: I x J {—1, +1}, 
given by f{i,j) = F,j. 

2.4. The generalized discrepancy method. The generalized discrepancy 
method is an intuitive and elegant technique for proving communication lower 
bounds. A starting point in our discussion is the following fact due to Linial and 



lnitial(a;,y) = |x,0) |0)|y,0). 



lnitial(a;, y) 




1 



P /(ar,2/) = (-l)^(-'^) >l-e 
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Shraibman |42l Lem. 10], with closely analogous statements established earlier by 
Yao Kremer and Razborov [55] , 

Theorem 2.7. Let X,Y be finite sets. Let P he a quantum protocol {with or 
without prior entanglement) with cost C qubits and input sets X and Y. Then 



for some real matrices A,B with \\A\\f 2'^\/|X| and \\B\\f ^ 2'^' y^\Y\. 

Theorem |2.7| states that the matrix of acceptance probabilities of every low-cost 
protocol P has a nontrivial factorization. This transition from quantum protocols to 
matrix factorization is a standard technique and has been used by various authors in 
various contexts. 

The generalized discrepancy method was first applied by Klauck pS", Thm. 4] and 
reformulated more broadly by Razborov [56 . The treatment in (56j is informal. In 
what follows, we propose a precise formulation of the generalized discrepancy method 
and supply a proof. 

Theorem 2.8 (generalized discrepancy method). Let X,Y be finite sets and 
f:XxY^ { — 1,+1} a given function. Let ^! — [^xy]xex,yeY be any real matrix 
with ll^'lli = 1. Then for each e > 0, 



.(/) >4q:(/) > 



{^,F)-2e 



where F = [f{x,y)]x^x,yeY- 

Proof. Let P be a quantum protocol with prior entanglement that computes / 
with error e and cost C. Put 



n = 



E[F(x,2/)] 



xex, yeY 



Then we can write F — {J — 211) -I- 2E, where J is the all-ones matrix and E is some 
matrix with H-EHoo ^ £■ As a result. 



(2.3) 



J - 2n) F)-2 E) 

^ (*,P> -2e||*||i 



On the other hand. Theorem |2.7| guaran tees the existence of matrices A and B with 
AB^Ii and \\A\\y \\B\\f ^ 4%7|X[|r|. Therefore, 

j-2n) ^ 11*11 ||J-2n||s 

< 11*11 f^^^ + 2||nb 



(2.4) 



^11*11 [./\X\\Y\+nAh\\Bh 
< 11*11 (2-4^ + 1) yJ\X\ \Y\. 



by ^ 

since ||J||s- Vl^ll^l 
by Prop. 



The theorem follows by comparing (|2.3|) and (|2.4| 
Remark 2.9. Theorem [2 



□ 

is not to be confused with Razborov's multidimen- 
sional technique, also found in [56 , which we will have no occasion to use or describe. 
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We will now abstract away the particulars of Theorem |2.8| and articulate the 
fundamental mathematical technique in question. This will clarify the generalized 
discrepancy method and show that it is simply an extension of Yao's original 
discrepancy method [IHl §3.5]. Let f : X x Y ^ {— 1,+1} be a given function whose 
communication complexity we wish to estimate. The underlying communication model 
is irrelevant at this point. Suppose we can find a function h: X xY ^ { — 1,+!} and 
a distribution fi on X x Y that satisfy the following two properties. 

1. Correlation. The functions / and h are well correlated under fj,: 



(2.5) 



E [f{x,y)h{x,y)] > e, 



where e > is a given constant. 

Hardness. No low-cost protocol P in the given model of communication can 
compute ft, to a substantial advantage under /i. Formally, if P : XxY ^ {Oj 1} 
is a protocol in the given model with cost C bits, then 



(2.6) 



E 



/i(x,y)E (-1) 



where 7 = o(l). The inner expectation in (2.6) is over the internal operation 

of the protocol on the fixed input (x, y). 
If the above two conditions hold, we claim that any protocol in the given model 
that computes / with error at most e/3 on each input must have cost Q{log{e/j}). 
Indeed, let P be a protocol with P[P{x, y) ^ f{x, y)] ^ e/3 for all x, y. Then standard 
manipulations reveal: 



E 



Hx,y)-E (-1) 



\P{x,y) 



;?E[fix,y)hix,y)] 



2 • - ^ 
3 3' 



where the last step uses (2.51. In view of (2.6 1, this shows that P must have cost 
f^(log{6/7}). 

We attach the term generalized discrepancy method to this abstract framework. 
Readers with background in communication complexity will note that the original 
discrepancy method of Yao [40, §3.5] corresponds to the case when f — h and the 
communication takes place in the two-party randomized model. 

The purpose of our abstract discussion was to expose the fundamental mathe- 
matical technique in question, which is independent of the communication model. 
Indeed, the communication model enters the picture only in the proof of (2.6 1. It is 



here that the analysis must exploit the particularities of the model. To place an upper 
bound on the advantage under /i in the quantum model with entanglement, as we see 



from (2.4|, one considers the quantity -y/jXjTK], where ^ = [h{x,y)^{x,y)]x,y. In 



the classical randomized model, the quantity to estimate happens to be 



max 

sex, 

TGY 



which is known as the discrepancy of h under /i. 

3. Duals of approximation and sign-representation. Crucial to our work 
are the dual characterizations of the uniform approximation and sign-representation of 
Boolean functions by real polynomials. As a starting point, we recall a classical result 
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from approximation theory due to lofTe and Tikhomirov |25| on the duahty of norms. 
A more recent treatment is available in the textbook of DeVore and Lorentz [IS] , 
p. 61, Thm. 1.3. We provide a short and elementary proof of this result in Euclidean 
space, which will suffice for our purposes. We let M.-^ stand for the linear space of real 
functions on the set X. 

Theorem 3.1 (lofTe and Tikhomirov). Let X be a finite set. Fix $ C and a 
function f : X ^M.. Then 

(3.1) min |l/-(/)||oo =niax<^ V /(a:)i/'(a;) >, 

0espan(*) V J 

where the maximum is over all functions ip: X ^ R such that 

E 1^(^)1 ^1 

and, for each g <&, 



Proof. The theorem holds trivially when span(<i>) ~ {0}. Otherwise, let (pi, . . . ,(pk 
be a basis for span(<i>). Observe that the left member of (3.11 is the optimum of the 
following linear program in the variables e, ai, . . . , a^: 



mmimize: e 



subject to: 



€ > 0. 



for each x G X, 
for each i. 



Standard manipulations reveal the dual: 



maximize: ipxfix) 
subject to: ^ IV'kI ^ 1; 



xGX 



for each i, 

for each x E X. 



Both programs are clearly feasible and thus have the same finite optimum. We have 



Since 



already observed that the optimum of first program is the left-hand side of (3.11. 

Afe form a basis for span(<i>), the optimum of the second program is by 

□ 



definition the right-hand side of (3.1 1 



As a corollary to Theorem |3.1| we obtain a dual characterization of the approxi- 
mate degree. 
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Theorem 3.2 (approximate degree). Fix e ^ 0. Let /: {0,1}" R be given, 
d — deg^{f) ^ 1. Then there is a function ijj: {0, 1}" — > M such that 

4'{s)^o {\s\<d), 
E 1^(^)1 = 1' 

a:G{0,l}" 

Proof. Set X = {0, 1}" and $ = {xs : \S\ < d} C . Since deg,(/) = d, we 
conclude that 



niin II/- 

(/)gspan(<E>) 



lloo > e. 

In view of Theorem |3.1| we can take ip to be any function for which the maximum is 



achieved in (3.11. 



□ 



We now state the dual characterization of the threshold degree, which is better 
known as Gordan's Transposition Theorem |571 §7.8]. 

Theorem 3.3 (threshold degree). Let f: {0,1}" { — 1,+1} be given, d = 
deg±(/). Then there is a distribution fj, over {0,1}" with 



E [f{x)xs{x)] - 



{\S\<d). 



See 



for a derivation of Theorem 3.3 using linear programming duality. 



Alternately, it can be derived as a corollary to Theorem 3.1 We close this section 
with one final dual characterization, corresponding to sign-representation by integer 
polynomials. 

Theorem 3.4 (threshold weight). Fix a function f : {0, 1}" {—1, +1} and an 
integer d ^ deg±(/). Then for every distribution /i on {0, 1}", 



(3.2) 



max 



E [f{x)xs{x)] 



1 



w{f,dy 

Furthermore, there exists a distribution fi such that 



(3.3) 



max 



E [f{x)xs{^)] 



( 2n 



\W{f,d) 



1/2 



Inequalities ( 3.2 ) and (3.3 1 are originally due to Hajnal et al. [25] and Freund [TO] , 
respectively. For an integrated treatment of both results, see Goldmann ct al. [H], 
Lem. 4 and Thm. 10. 

4. Pattern matrices. We now turn to the second ingredient of our proof, a 
certain family of real matrices that we introduce. Our goal here is to explicitly calculate 
their singular values. As we shall see later, this provides a convenient means to generate 
hard communication problems. 

Let t and n be positive integers, where t < n and t \ n. Partition [ri] into t 
contiguous blocks, each with n/t elements: 



1, 



2?! 

T 



u • • • u 



t 



1, 
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Let 1^(n, t) denote the family of subsets V C [n] that have exactly one element in each 

of these blocks (in particular, \V\ = t). Clearly, {^{njt)] — {n/ty. For a bit string 
X e {0, 1}" and a set V € y{n,t), define the projection of x onto V by 



where ii < i2 < ■ ■ ■ < it are the elements of V. We are ready for a formal definition 

of OTir matrix family. 

Definition 4.1 (pattern matrix). For <j): {0, 1}* M, the (n, t, (j))-pattern matrix 
is the real matrix A given by 



In words, A is the matrix of size 2" by (n/t)*2* whose rows are indexed by strings 
x G {0, 1}", whose columns are indexed by pairs {V, w) € y{n, t) x {0, 1}*, and whose 
entries are given by = (f'i^W © w). 

The logic behind the term "pattern matrix" is as follows: a mosaic arises from 
repetitions of a pattern in the same way that A arises from applications of (j) to various 
subsets of the variables. Our approach to analyzing the singular values of a pattern 
matrix A will be to represent it as the sum of simpler matrices and analyze them 
instead. For this to work, wc should be able to reconstruct the singular values of A 
from those of the simpler matrices. Just when this can be done is the subject of the 
following lemma. 

Lemma 4.2 (singular values of a matrix sum). Let A, B be real matrices with 
AB^ = and A^ B = 0. Then the nonzero singular values of A + B, counting 
multiplicities, are ai{A),. . . , cTrk a {A) , cti (B) , . . . , (Trk b{B). 

Proof. The claim is trivial when ^ = or i? = 0, so assume otherwise. Since 
the singular values of A + B are precisely the square roots of the eigenvalues of 
{A + B){A + BY , it suSices to compute the spectrum of the latter matrix. Now, 



x\y — ") 



,...,a;ije{0,l}*, 



A= (f){x\v®w) 



-la;e{0,l}", (y,w)er(n,t)x{o,i}« 



{A + B){A + BY = AA^ + BB'^ + AB'^ + BA^ 



=0 



=0 



(4.1) 



= AA^ + BB^. 



Fix spectral decompositions 



aa^ = Y,<aY 




BB^ 




Then 



liiA rkB 





i=l j=l 



(4.2) 



{AA'^, BB'^) 
iv{AA^BB'^) 
ir{A • • B"^) 
0. 
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Since ai{A) (Tj{B) > for all it follows from (4.2) that {ui,Vj) — for all i,j. Put 
diffe rently, the vectors Ui, . . . , UrkAj ''^ij ■ ■ ■ , 'fi±B form an orthonormal set. Recalling 
(4.1 ), we conclude that the spectral decomposition of {A + B){A + B)~^ is 



rk A rk B 

i=l 3 = 1 

and thus the nonzero eigenvalues of {A + B){A + BY are as claimed. □ 
We are ready for the main result of this section. 

Theorem 4.3 (singular values of a pattern matrix). Let (f>: {0, 1}* ^ M. be given. 
Let A be the {n,t, (j)) -pattern matrix. Then the nonzero singular values of A, counting 
multiplicities, are: 



u 



S:0(S)#O 

In particular, 



2"+* 



\ks)\ 



\S\/2 



repeated ^ — ^ 



times 



2".M-j max 10(^)1^^ 



|S|/2^ 



Proof. For each 5 C [t], let As be the (n, i, xs)-pattern matrix. Thus, 
(4.3) A^Y. ^(^)'^s- 



sat] 



Fix arbitrary S,T C [t] with S ^T. Then 



AsAl 



V€y{n,t) U)G{0,1}' 



x,y 



X! Xs{x\v) XrivW) Y Xs{w) xt{w) 
ve'V{n,t) uie{o,i}' 



x,y 



=0 



(4.4) 

Similarly, 
(4.5) 



Xs{w) xt{w') ^ Xs{x\v) Xt{x\v') 
xe{o,i}" 



(V,w),(V',w') 



=0 



By (4.3l-(4.5l and Lemma 4.2 the nonzero singular values of A are the union of the 



nonzero singular values of all (f>(S)As, counting multiplicities. Therefore, the proof 
will be complete once we show that the only nonzero singular value of Al^As is 
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2"+*(n/t)* with multiplicity (n/i)'"^'. It is convenient to write this matrix as the 
Kronecker product 

A^As = [xs{w)xs{w')]w,w' «) 

The first matrix in this factorization has rank 1 and entries ±1, which means that its 
only nonzero singular value is 2* with multiplicity 1. The other matrix, call it M, is 
permutation-similar to 

'J 

J 

J 

where J is the all-ones square matrix of order (n/t)*~l'^l. This means that the only 
nonzero singular value of M is 2"(n/i)*~l'^l with multiplicity (n/t)l'^l. It follows from 
elementary properties of the Kronecker product that the spectrum of A'gAs is as 
claimed. □ 

5. Pattern matrix method using uniform approximation. The previous 
two sections examined relevant dual representations and the spectrum of pattern 
matrices. Having studied these notions in their pure and basic form, we now apply 
our findings to communication complexity. Specifically, we establish the pattern matrix 
method for communication complexity, which gives strong lower bounds for every 
pattern matrix generated by a Boolean function with high approximate degree. 

Theorem |1.1| (restated from p. [3|. Let F he the {n,t, f) -pattern matrix, where 
f : {0, 1}* +1} 'is given. Then for every e € [0, 1) and every 6 < e/2, 

(5.1) Q^(^)>^^g^(/)l°g(i)-2l°g(^)- 



Xs{x\v) Xs{x\v' 



a:e{0,l}" 



In particular, 
(5.2) 



Ql/j{F) > ^degi/3(/)log(") -3. 



Proof. Since (5.11 immediately implies (5.2 1, we will focus on the former in the 
remainder of the proof. Let d — deg^{f) ^ 1. By Theorem 3.2 there is a function 
ip: {0, 1}* M such that: 



(5.3) 
(5.4) 

(5.5) 



i^iS) = 

z6{0,l}* 



i\S\<d), 



Let ^ be the {n,t,2 ^(n/t) *i/))-pattern matrix. Then (5.4| and (5.51 show that 
(5.6) ll*l|i-l, {F,^)>e. 
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Our last task is to calculate ||^'||. By (5.4) and Proposition 2.1 



(5.7) 



max \iIj(S)\ 2" 
sat]' 



Theorem 4.3 yields, in view of (5.3) and (5.7) 

d/2 



(5.: 



in+t 



-1/2 



Now (5.1) follows from (5.6), (5.8), and Theorem 2.8 



□ 



Theorem |1.1| gives lower bounds not only for bounded-error communication but 
also for communication protocols with error probability ^ — o(l). For example, if a 
function /: {0, 1}* — *■ {—1, +1} requires a polynomial of degree d for approximation 
within 1 — o(l), equation (5.1 1 gives a lower bound for small-bias communication. We 
will complement and refine that estimate in the next section, which is dedicated to 
small-bias communication. 

We now prove the corollary to Theorem |1.1| on function composition, stated in 
the introduction. 

Proof of Corollary 1.2 The (2t, i, /)-pattern matrix occurs as a submatrix of 

[F{x,y)]x,ye{0,l}'i> 



□ 



Finally, we show that the lower bound (5.2) derived above for bounded-error 



communication complexity is tight up to a polynomial factor, even for deterministic 
protocols. The proof follows a well-known argument in the literature [H1[S], as pointed 
out to us by R. de Wolf [I^. 

Proposition 5.1 (on the tightness of Theorem |1.1[ ). Let F be the (n, t, f) -pattern 
matrix, where f: {0,1}* — > {—1,-1-1} is given. Then 

D{F) < 0(dt(/)log(nA)) < 0(degi/3(/)«log(nA)), 

where dt(/) is the least depth of a decision tree for f. In particular, (5.2) is tight up 
to a polynomial factor. 

Proof. Beals al. [5, Cor. 5.6] prove that dt(/) ^ 0{degi^^{f)^) for all Boolean 
functions /. Therefore, it suffices to prove an upper bound of 0{dlog{n/t)) on the 
deterministic communication complexity of F, where d — dt(/). 

The needed deterministic protocol is well-known. Fix a depth-c? decision tree for 
/. Let {x, (y, w)) be a given input. Alice and Bob start at the root of the decision 
tree, labeled by some variable i E {1, . . . ,t}. By exchanging \log{n/t)~\ + 2 bits, Alice 
and Bob determine (a;|y)i © S {0, 1} and take the corresponding branch of the 
tree. The process repeats until a leaf is reached, at which point both parties learn 
f{x\v(Bw). □ 

6. Pattern matrix method using threshold weight. As we have already 
mentioned. Theorem |1.1| of the previous section can be used to obtain lower bounds 
not only for bounded-error communication but also small-bias communication. In the 
latter case, one first needs to show that the base function / : {0,1}* — > {—1,-1-1} can- 
not be approximated pointwise within 1 — o(l) by a real polynomial of a given degree 
d. In this section, we derive a different lower bound for small-bias communication, 
this time using the assumption that the threshold weight W{f, d) is high. We will see 
that this new lower bound is nearly optimal and closely related to the lower bound 
111 Theorem O 
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Theorem 6.1 (pattern matrix method using threshold weight). Let F be the 
(n,t, f)-pattern matrix, where f: {0,1}* {— 1,+1} is given. Then for every integer 
d ^ I and real 7 £ (0, 1), 



1 



W{,f,d-1) 



(6.1) Qi/2-7/2(^) ^ ^minl^logy, log 
In particular, 

(6.2) Ql/2-^/2iF) ^ - deg±(/) log (-) - - log 



2t 



2 7 



Proof. Letting d — deg±(/) in (6.1) yields ( |6.2[), since W{f,d— 1) = 00 in that 
case. In the remainder of the proof, we focus on (6.1 ) alone. 

We claim that there exists a distribution fi on {0, 1}* such that 



(6.3) 



max 

\S\<d 



E [f{z)xs{z)] 



2t 



Wif,d-l] 



1/2 



For d ^ deg±(/), the claim holds by Theorem 3.3 since W{f, d—1) = cx3 in that case. 



3.4 



For d > deg±(/), the claim holds by Theorem 

Now, define ip: {0, 1}* ^ K by ^/'(z) = fiz)fi{z). It follows from that 



(6.4) 
(6.5) 

(6.6) 



2t 



ze{o,i}' 



W{f,d~l) 
1, 



1/2 



i\S\<d), 



Let ^ be the {n,t,2 "(n/t) *7/')-pattern matrix. Then (6.5 1 and (6.6 1 show that 
(6.7) ll«'l|i = l, {F,^) = l- 



It remains to calculate ||^||. By (6.5) and Proposition 2.1 

(6.8) max|?A(S')| sC 2-*. 

sat] 



Theorem 4.3 yields, in view of (|6.4|) and (|6 
(6.9) 



I'i'W < max ■ 



d/2 



2t 



' \W{f,d-l) 



1/2 ~ 



(?)' 



-1/2 



Now (|6T| follows from ( |6.7|) ( |6.9[ ), and Theorem |2^ □ 

Recall from Theorem 2.5 that the quantities E{f, d) and W{f, d) are related for all 
/ and d. In particular, the lower bounds for small-bias communication in Theorems |1.1| 
and |6.1| are quite close, and either one can be approximately deduced from the other. 
In deriving both results from scratch, as we did, our motivation was to obtain the 
tightest bounds and to illustrate the pattern matrix method in different contexts. We 
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will now see that the lower bound in Theorem |6.1| is close to optimal, even for classical 
protocols. 

Theorem 6.2. Let F he the (n, t, f)-pattern matrix, where f : {0, 1}* {—1, +1} 
is given. Then for every integer d ^ deg±(/), 

Ql/2-y/2iF) < Rl/2-y/2{F) < dlog (^) + 3, 

where 7 = 1/W{f,d). 

Proof. The communication protocol that we will describe is standard and has been 
used in one form or another in several works, e.g., [5TJ I^H ESI ED]- Put W — W{f, d) 
and fix a representation 



/(z) = sgn 



>SC[t], |5Krf 



where the integers A5 satisfy J2 \^s\ — input (x, {V,w)), the protocol proceeds 

as follows. Let ii < 12 < ■ ■ ■ < it be the elements of V. Alice and Bob use their 
shared randomness to pick a set S* C [t] with \S\ < d, according to the probability 
distribution |As|/VF. Next, Bob sends AHce the indices {ij : j & S} as well as the bit 
Xsiw). With this information, Alice computes the product sgn{Xs)xs{x\v)xs{'w) = 
sgn{Xs)xs{x\v ® w) and announces the result as the output of the protocol. 

Assuming an optimal encoding of the messages, the communication cost of this 
protocol is bounded by 



log 



2 dlog 



as desired. On each input x, V, w, the output of the protocol is a random variable 
P{x,V,'w) G {-1,+1} that obeys 

f{x\v®w)E[P{x,V,w)] = f{x\v(Bw) ^sgn(As)xs(a:^k ®H 



1 

W 
1 



Y ^sXs{x\v®w) 



which means that the protocol produces the correct answer with probability ^ 
or greater. □ 



1 
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7. Discrepancy of pattern matrices. We now restate some of the results of 
the previous section in terms of discrepancy, a key notion already mentioned in Sec- 
tion |2.4[ This quantity figures prominently in the study of small-bias communication 
as well as various applications, such as learning theory and circuit complexity. 

For a Boolean function / : X x F — > {—1,-1-1} and a probability distribution A 
on X X Y, the discrepancy of / under A is defined by 



discA(/) = max 



sex, 

TCY 



xeSyGT 
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We put 

disc(/) = mindiscA(/). 

As usual, we will identify a function f : X x Y ^ {— 1,+1} with its communication 
matrix F = [f(x,y)]x y and use the conventions discA(-F) = discA(/) and disc(_F') = 
disc(/). 

The above definition of discrepancy is not convenient to work with, and we will 
use a well-known matrix-analytic reformulation; cf. Kushilevitz & Nisan [40, Ex. 3.29]. 
For matrices A — [A^y] and B = [Bxy], recall that their Hadamard product is given 
hy AoB = [A^yB^y]. 

Proposition 7.1. LetX,Y be finite sets, f: XxY ^ {-1,+1} a given function. 
Then 

discp(/) 

where F — [f(x,y)]xi£X,yi£Y o,nd P is any matrix whose entries are nonnegative and 
sum to 1 {viewed as a probability distribution). In particular, 

disc(/) ^ Vl^l|i"|niin||PoF|i, 

where the minimum is over matrices P whose entries are nonnegative and sum to 1. 
Proof. We have 

discp (/) = raax 1 1 J (F o F) It | 

^m^{|jlsM|PoF||.||lT||} 

as claimed. □ 

We will need one last ingredient, a well-known lower bound on communication 
complexity in terms of discrepancy. 

Proposition 7.2 (see [KJ!, pp. 36-38]). For every function f : X xY ^ {-1,4-1} 
and every 7 € (0, 1), 

i?i/2-,/2(/);^iog^. 



Using Theorems |6.1| and |6.2[ we will now characterize the discrepancy of pattern 
matrices in terms of threshold weight. 

Theorem 7.3 (discrepancy of pattern matrices). Let F be the {n^t, f) -pattern 
matrix, where f: {0, 1}* — > {—1,-1-1} is given. Then for every integer 0, 

(7.1) disc(F)^ ^ 



SWif.d) 
and 



(7.2) disc(F)2 max ■ 



2t ft 

W{f,d-l) 
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In particular, 
(7.3) 



disc(F) ^ 



dog±(/)/2 



Proof. The lower bound (7.1 1 is immediate from Theorem 6.2 and Proposition 



For the upper bound (7.2 1, construct the matrix ^ as in the proof of Theorem 



7.2 



6.1 



Then (6.7) shows that ^ = F o P for a nonnegative matrix P whose entries sum 
to 1. As a result, ( |7.2[ ) follows from ( |6.9[ ) and Proposition |7.1[ Finally, (7.3 1 follows 
by taking d — deg±(/) in (7.2 1, since W{f, d — 1) = oo in that case. □ 



This settles Theorem |1.5| from the introduction. Theorem |7.3| follows up and 
considerably improves on our earlier result, the Degree/Discrepancy Theorem [60]: 

Theorem 7.4 (Sherstov). Let /: {0,1}* { — 1,+1} he given. Fix an integer 
n'^ t. Let M = [f{x\s)]x,Si where the row index x ranges over {0, 1}" and the column 
index S ranges over all t-element subsets of {1, 2, . . . , n}. Then 



disc(M) < 



Aet^ 
ndeg±{f) 



dcg±(/)/2 



Note that (7.3 1 is already stronger than Theorem 7.4 In Section 10 we will see 



an example when Theorem |7.3| gives an exponential improvement on Theorem |7.4[ 
Threshold weight is typically easier to analyze than the approximate degree. For 



completeness, however, we will now supplement Theorem 7.3 with an alternate bound 
on the discrepancy of a pattern matrix in terms of the approximate degree. 

Theorem 7.5. Let F be the (n,t., f) -pattern matrix, for a given function 
f: {0,1}* ^ {-1,+1}. Then for every j > 0, 



disc(F) < 7 - 



^\ dogi_,(/)/2 



Proof. Let d ~ dcgi_.,^{f) ^ 1. Define e = 1 — 7 and construct the matrix "if as in 
the proof of Theorem |1.1[ Then ( 5.6 ) shows that = H o P^ where H is a, sign matrix 
and P is a nonnegative matrix whose entries sum to 1. Viewing P as a probability 
distribution, we infer from ( |5.8[ ) and Proposition |7.1| that 



(7.4) 



disc p{H) ^ 



d/2 



Moreover, 



(7.5) 



discp(P) < discp{H) + \\{F - H) o P\\i 
= discp{H) + l-{F,HoP) 
< discp{H)+j, 



where the last step follows because (P, > e = 1 — 7 by (5.6 1. The proof is complete 
in view of ^lM and ^L5\. □ 
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8. Approximate rank and trace norm of pattern matrices. We will now 
use the results of the previous sections to analyze the approximate rank and approxi- 
mate trace norm of pattern matrices. These notions were originally motivated by lower 
bounds on quantum communication |66 | 1121 However, they also arise in learning 
theory |35j and are natural matrix- analytic quantities in their own right. In particular, 
Klivans and Sherstov JBS] proved exponential lower bounds on the approximate rank 
of disjunctions, majority functions, and decision lists, with applications to agnostic 
learning. In what follows, we broadly generalize these results to any functions with 
high approximate degree or high threshold weight. 

Theorem 8.1. Let F he the (n, t, f)-pattern matrix, where f: {0, 1}* {— 1, +1} 
is given. Let s = 2"+*(n/t)* be the number of entries in F. Then for every e G [0, 1) 
and every (5 G [0, e], 



(8.1) \\Fh,5^{e-6)Q 



n\dog,(/)/2 



and 



Proof. We may assume that deg^{f) ^ 1, since otherwise / is a constant function 
and the claims hold trivially. Construct ^ as in the proof of Theorem |1.1[ Then the 



claimed lower bound on ||F||5]> follows from (5.6 1, (5.8 1, and Proposition pT2l Finally. 



2) follows from (8.11 and Proposition 2.3 □ 



We prove an additional lower bound in the case of small-bias approximation. 

Theorem 8.2. Let F be the {n,t, f) -pattern matrix, where f : {0,1}* {— 1,+1} 
is given. Let s = 2"+*(n/i)* be the number of entries in F. Then for every 7 € (0, 1) 
and every integer d ^ 1, 



.3) ^ 7min <; ( ^ ) , ( -"^''^^ ^' ) ^ 




„xd/2 (W{f,d~~\) 



and 

(8.4) rki_^ F ^ 



7 y . r (n\<^ W{f,d~\) 



^2-77 \\tl 2t 

In particular, 

'n\dog±(/)/2 



(8.5) ll^^b.i-7^7(^) 
and 



7 \ (n\^'^&±U) 



Proof. Construct as in the proof of Theorem |6.1[ T hen the claimed lower bound 
on llFlls 5 follows from (6.7l, (6.9), and Proposition 2.2 Now (8.4| follows from (8.3l 
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and Proposition 2.3 
and 



.41 



Finally, dS^ and follow by taking d = dcg±(/) in ([S^l 



respectively, since W{f, c? — 1) = oo in that case. □ 
Theorems 18 . II and 18.21 settle Theorem 11.41 from the introduction. 



Recall that Theorem 4.3 gives an easy way to calculate the trace norm and rank 
of a pattern matrix. In particular, it is straightforward to verify that the lower bounds 
in (8.2 1 and ( |8.4[ ) are close to optimal for various choices of e, 5, 7. For example, one has 
11^ ~ ^lloo ^ 1/3 by taking F and A to be the {n, t, /)- and {n, t, 0)-pattern matrices, 
where <j): {0, 1}* M is any polynomial of degree degi/^{f) with ||/ — (l>\\co ^ 1/3. 

9. Application: quantum complexity of symmetric functions. As an illus- 
trative application of the pattern matrix method, we now give a short and elementary 
proof of Razborov's optimal lower bounds for every predicate D: {0,1,..., n} 
{—1,-1-1}. We first solve the problem for all predicates D that change value close to 0. 
Extension to the general case will require an additional step. 

Theorem 9.1. Let D: {0,1,..., n} {—1,-1-1} be a given predicate. Suppose 
that D{£) ^ D{e - 1) for some £ < ^n. Then 



Proof. It suffices to show that Ql^^{D) ^ n{Vn£). Define /: {0,1} 

{-1,+1} by f{z) = D{\z\). Then degi/3(/) ^ n{V^£) by Theorem 
implies that 



2.6 



L"/4J 



Theorem 



1.1 



Ql^^{F);^n{Vn£), 

where F is the (2[n/4j, [n/4j , /)-pattern matrix. Since F occurs as a submatrix of 
[D{\x A y\)]x,y, the proof is complete. □ 

The remainder of this section is a simple if tedious exercise in shifting and padding. 
We note that Razborov's proof concludes in a similar way (see [55] , beginning of 
Section 5). 

Theorem 9.2. Let D: {0,1,..., n} { — 1,-1-1} be a given predicate. Suppose 
that D{i) ^ D{£ - 1) for some £ > ^n. Then 



(9.1) 



l^,{D)^c{n-£) 



for some absolute constant c > 0. 

Proof. Consider the communication problem of computing D{\x A y\) when the 
last k bits in x and y are fixed to 1. In other words, the new problem is to compute 



Dk{\x' A y'l), where x',y' G {0,1}" and the predicate Dk'. {0,1, 



.,n 



-k} 



{— 1, -1-1} is given by Dk{i) = D{k + i). Since the new problem is a restricted version 
of the original, we have 

(9.2) gt/3(i?)^Q*/3(i?fe). 

We complete the proof by placing a lower bound on (5*^g(£>fe) for 



k = l- 



{n-l) 



where a = ^. Note that k is an integer between 1 and £ (because £ > an). The 
equality k — £ occurs if and only if [y3j;;:('t^ — £)\ — 0, in which case (9.11 holds 
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trivially for c suitably small. Thus, we can assume that l,in which case 

Dk{£ ~ k) ^ Dk{l — fc — 1) and £ — k ^ a{n — k). Therefore, Theorem 9.1 is applicable 
to Dk and yields: 



(9.3) Ql/siDk) > C^{n-k){e~k), 

where C > is an absolute constant. Calculations reveal: 



(9.4) 



k = 



1 

1 - a 



in-£) 



k = 



a 
1 - a 



in-£) 



The theorem is now immediate from (9.2|-(9.4| 



Theorem 1.3 (restated from p.H 



□ 



Together, Theorems |9 . 1 1 and |9. 2| gi ve the main result of this section: 

Let D: {0,1, 



,n} {-1,+1}. Then 



Q\/^{D) ^ n{y'ne„{D)+h{D)), 

where ioiD) £ {0, 1, . . . , [n/2j } and ii{D) e {0, 1, . . . , [n/2] } are the smallest 
integers such that D is constant in the range [£o{D),n — £i{D)]. 

Proof. If eo{D) 7^ 0, se^^ ^ 4(D) and note that D{e) ^ D{£ - 1) by 

must be applicable, and therefore Q^ijiD) ^ 



9.1 



and 



9.2 



definition. One of Theorems 

min{il(-\/rji), ri(n — £)}. Since £ ^ n/2, this simplifies to 
(9.5) 



If £x{D) 7^ 0, set ^ = n - lx{D) 
before. By Theorem |9.2| 



1 ^ n/2 and note that D{1) ^ D{£ - 1) as 



(9.6) 



Ql/^{D)^n{£,{D)). 



The theorem follows from (9.5 1 and (^9.6 



□ 



10. Application: discrepancy of constant-depth circuits. As another ap- 
plication of the pattern matrix method, we revisit the discrepancy of AC°, the class 
of polynomial-size constant-depth circuits with AND, OR, NOT gates. In an earlier 
work [60j . we obtained the first exponentially small upper bound on the discrepancy 
of a function in AC'', with applications to threshold circuits. Independently, Buhrman 
et al. |llj exhibited another function in AC" with exponentially small discrepancy. 
We revisit these two discrepancy bounds below, considerably sharpening the bound 
in |60j and giving a new and simple proof of the bound in |llj . 

Consider the function MP„ : {0, 1}''™' {-1,+1} given by 



MP^(x) 



i=i i=i 



This function was originally defined and studied by Minsky and Papert [35] in their 
seminal monograph on p erceptrons. Using this function and the Degree/Discrepancy 
Theorem (Theorem 7.4 1, an upper bound of exp{— ri(n^/^)} was derived in 16 Oj on 
the discrepancy of an explicit AC° circuit /: {0, 1}" x {0, 1}" {^li +1} of depth 3. 
We will now sharpen that bound to exp{— ri(n^/'^)}. 

Theorem 1.6 (restated from p. |5|. Let f{x, y) ~ MP^ix V y). Then 



disc(/) = exp{— ri(m)}. 
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Proof. Put d = [m/2j. A well-known result of Minsky and Papert [45] states 
that deg±(MPd) ^ d. Since the (8c?^, 4(i'^, MPj;)-pattem matrix is a submatrix of 



[f{x, y)]x,y^ the proof is complete in view of equation (7.3 1 of Theorem 7.3 □ 

We now turn to the result of Buhrman et al. The ODD-MAX-BIT function 
OMB„ : {0, 1}" {-1, +1}, due to Beigel [7], is given by 

(10.1) OMB„(a;) = sgn |^1 + ^(-2)'a;,^ . 

It is straightforward to compute OMB„ by a linear-size DNF formula and even a 
decision list. In particular, OMB„ belongs to the class AC°. Buhrman et al. [ITj §3.2] 
proved the following result. 

Theorem 10.1 (Buhrman et al.). Let f{x,y) = OMB„(x Ay). Then 

disc(/) = exp{-n{n^^^)}. 



Using the results of this paper, we can give a short alternate proof of this theorem. 

Proof. Put m — [n/4j. A well-known result due to Beigel [7 shows that 
W{OMBm,cm^/^) ^ exp(cm^/'^) for some absolute constant c > 0. Since the 
(2m, m, OMBm)-pattern matrix is a submatrix of [f(x,y)]x^y, the proof is complete 
by Theorem 7.3 □ 

Remark 10.2. The above proofs illustrate that the characterization of the dis- 



crepancy of pattern matrices in this paper (Theorem 7.3 1 is a substantial improvement 



on our earlier result (Theorem 7.4 1. In particular, the representation (10.1 1 makes it 
clear that deg±(OMB„) = 1 and therefore Theorem 7.4 cannot yiel d an upper bound 

on the other 



7.3 



better than n~^'-^'> on the discrepancy of OMB„(x A y). Theorem 
hand, gives an exponentially better upper bound. 

It is well-known ^Tl [23 07] that the discrepancy of a function / implies a lower 
bound on the size of depth-2 majority circuits that compute /. Following [601, we 



record the consequences of Theorems 1 1 . 6[ and 1 1 . 1 1 in this regard. 

Theorem 10.3. Any majority vote of threshold gates that computes the function 

f{x,y)^MP^{xyy) 

has size exp{il{m)}. Analogously, any majority vote of threshold gates that computes 
the function 

f{x,y) ^OMB,,{x Ay) 

has size exp{ri(7i^/'^)}. 

Proof. Analogous to the proof given in [30] §7]. □ 

11. Pattern matrices and the log-rank conjecture. In previous sections, 
we characterized various matrix-analytic and combinatorial properties of pattern ma- 
trices, including their classical and quantum communication complexity, discrepancy, 
approximate rank, and approximate trace norm. We conclude this study with another 
interesting fact about pattern matrices. Specifically, we show that they satisfy the 
well-known log-rank conjecture [40i p. 26]. 

In a seminal paper, Mehlhorn and Schmidt [44] observed that the deterministic 
communication complexity of a sign matrix F satisfies D{F) ^ logrk_F. The log-rank 
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conjecture is that this lower bound is always tight up to a polynomial factor, i.e., 
D{F) ^ (logrkF)'^(^'. Using the results of the previous sections, we can give a short 
proof of this hypothesis in the case of pattern matrices. 

Theorem 11.1 (on the log-rank conjecture). Let f : {0,1}* {— 1,+1} be a 
given function, d — deg(/). Let F be the {n,t, f)-pattern matrix. Then 



(11.1) rkF ^ > exp{n{D{Fy/^)}. 



In particular, F satisfies the log-rank conjecture. 

Proof. Since f{S) for some set S with \S\ — d, Theorem 4.3 implies that F 
has at least (n/t)'^ nonzero singular values. This settles the first inequality in ( |ll.l[ ). 



Proposition 5.1 implies that D{F) ^ 0(dt(/) log(n/t)), where dt(/) denotes the 
least depth of a decision tree for /. Nisan and Smolensky [lOj Thm. 12] prove that 
dt(/) ^ 2deg(/)'* for all /. Combining these two observations establishes the second 
inequality in ( |11.1[ ). □ 

12. Related work. Shi and Zhu independently obtained a result related to 



our lower bound (5.2) on bounded-error communication. Fix functions /: {0,1}" 
{ — 1, +1} and 5: {0, 1}*^ X {0, l}'^ — > {0, 1}. Let fog" denote the composition of / with 
n independent copies of g. More formally, the function fog": {0, 1}"*^' x {0, 1}"'' 
{ — 1, +1} is given by 

(/o5")(x,2/) = /(g(xW,yW), g(x("), y(")) ) , 

where x = {x^^^\ . . . ,a;(")) G {0, 1}"'^' and y ^ {y^^\ . . . ,y^"^) £ {0, 1}"'=. Shi and Zhu 
study the communication complexity of fog". Their main result |62l Lem. 3.5] is that 

Ql/sif ° 5") ^ f^(degi/3(/)) provided that p{g) < 

where p(g) is a new variant of discrepancy that the authors introduce. As an illustra- 
tion, they re-prove a weaker version of Razborov's lower bounds in Theorem |1.3| In 



our terminology (Section 2.4 1, their proof also fits in the framework of the Klauck- 
Razborov generalized discrepancy method. 

Shi and Zhu's result revolves around the quantity p{g), which needs to be small. 
This poses two complications. First, the function g will generally need to depend on 
many variables, from k = 0(logn) to fc = n^^^\ which weakens the final lower bounds 
on communication. For example, the lower bounds obtained in |62j for symmetric 



functions are polynomially weaker than optimal (Theorem 1.3). 

A second complication, as the authors note, is that "estimating p(g) is unfortu- 
nately diflicult in general" [HU §4.1]. For example, re-proving Razborov's lower bounds 
reduces to estimating p(g) for g{x,y) = xiyi V • • • V Xkyu- Shi and Zhu accomplish 
this using Hahn matrices, an advanced tool that is the centerpiece of Razborov's own 
proof (Razborov's use of Hahn matrices is somewhat more demanding). 

Our method avoids these complications altogether. For example, we prove (by 
taking n = 2t in the pattern matrix method. Theorem |1.1[ ) that 

QJ/3(/o.g") ^f](degi/3(/)) 
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for any function g : {0, 1}'' x {0, 1}'' ^ {0, 1} such that the matrix [g{x^ v)\x,y 
the following submatrix, up to permutations of rows and columns: 

^10 10^ 
10 1 
110 
^0101 

To illustrate, one can take g to be 

g{x,y) = xiyi V X2y2 V xsya V X4y4 



contains 



or 



9{x,y) 



xiyiy2 V xiyiy2 V X2yiy2 V X2yiy2- 



In summary, there is a simple function g on k = 2 variables that works universally for 
all /. This means no technical conditions to check, such as p{g), and no blow-up in 
the number of variables. As a result, we are able to re-prove Razborov's optimal lower 
bounds exactly. Moreover, the technical machinery of this paper is self-contained and 
disjoint from Razborov's proof. 

A further advantage of the pattern matrix method is that it extends in a 
straightforward way to the multiparty model 2TJ[TH[TS1[THJ[S]. This extension depends 
on the fact that the rows of a pattern matrix are applications of the same function 
to different subsets of the variables. In the general context of block composition, it is 
unclear how to carry out this extension. Further details can be found in the survey [61J. 

These considerations do not diminish the technical merit of Shi and Zhu's method, 
which is of much interest. The proofs in [62 and this paper start out with the same 



duality transformation (Theorem 3.2 1 but diverge substantially from then on, which 



explains the differences in our results. Specifically, we introduce and analyze pattern 
matrices, while Shi and Zhu construct a much different family of matrices. 
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Appendix A. On uniform approximation and sign-representation. The 
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purpose of this appendix is to prove Theorem 2.5 on the representation of a Boolean 
function by real versus integer polynomials. Similar statements have been noted earlier 
by several authors [38l E] • We derive our result by modifying a recent analysis due 
to Buhrman et al. [11] Cor. 1]. 

Theorem |2.5| (restated from p. 



lOl. Let f: {0,1}" {-1,+1} be given. Then 



ford =0,1, 



l-E{f,d) 



< W{f,d) ^ 



l-E{f,d) 



3/2 



voith the convention that 1/0 = oo. 

Proof. One readily verifies that W{f, d) = oo if and only if E{f, d) = 1. In what 
follows, we focus on the complementary case when W{f, d) < oo and E{f, d) < 1. 

For the lower bound on W{f,d), fix integers Ag with X]|S|^d l-^sl = ^(/i*^) 
such that the polynomial p{x) = S|SKd •^■SXs(2^) satisfies /(x) = sgnp(2:). Then 
1 ^ f{x)p{x) < W{f, d) and therefore 



E{f,d) < 



/ 



W{f,d) 



^ 1 



W{f,d) 



To prove the upper bound on W{f,d), fix any degree-c? polynomial p such that 
11/ - pWoo = E{f, d). Define 5=1- E{f, d) > Q a.nd N = J2Lo (")• For a real t, let 
rndt be the result of rounding t to the closest integer, so that \t — rndt| ^1/2. We 
claim that the polynomial 



q{x) = J2 rnd{Mp{S))xsix), 



\S\^d 



where M = 3N/{4S), satisfies f{x) = sgnq{x). Indeed, 



fix) - J^li^) 



< \ f{x)-p{x)\ + -\Mp{x)-q{x)\ 

^ 1 - + E l^^^(^) - md(Mp(5))| 

\S\sid 



€1-6 



< 1. 
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It remains to examine the sum of the coefficients of q. We have: 
^ \rnd{MpiS))\ ^ In + M ^ \p{S)\ 



1/2 



l-N + M (ne\p{x)^ 

2 \ a; 



2NVN 

where the second step follows by an application of the Cauchy-Schwarz inequality and 
Parseval's identity (2.1). □ 



